Interactive Adversarial Attack & Defense Visualization

Introduction

Interactive Demo

Defense Comparison

Learn More

Understanding Adversarial Attacks and Defenses

This interactive tool demonstrates how adversarial attacks can fool AI systems and how defensive techniques can make models more robust against these attacks.

What are Adversarial Attacks?

Adversarial attacks are specially crafted perturbations added to input data that cause machine learning models to make incorrect predictions. These perturbations are often imperceptible to humans but can completely change a model's output.

Original Image (7)

+ Perturbation (amplified)

= Adversarial Example (2)

Fast Gradient Sign Method (FGSM)

In this demo, we implement the Fast Gradient Sign Method (FGSM), a common adversarial attack. FGSM works by:

Taking a correctly classified input image
Computing the gradient of the loss with respect to the input
Creating a perturbation by taking the sign of this gradient
Adding this perturbation (scaled by epsilon) to the original image

The result is an "adversarial example" that looks almost identical to the original image to humans, but is misclassified by the model.

Defense Strategies

We'll explore several approaches to defend against adversarial attacks:

Adversarial Training

Training models on adversarial examples so they learn to resist attacks. Like immunization, exposing the model to attacks during training makes it more robust.

Input Preprocessing

Applying transformations to input images (like Gaussian noise) that disrupt adversarial perturbations while preserving key features for classification.

Ensemble Defense

Combining predictions from multiple models, making attacks harder because they need to fool all models simultaneously.

Get Started: Click on the "Interactive Demo" tab to create adversarial examples and test defense strategies!

Interactive Adversarial Attack Demo

In this interactive demo, you can generate adversarial examples using the FGSM attack and see how different defenses perform against them.

Step 1: Select an Image

Choose a digit:

Predicted: -

Step 2: Generate an Adversarial Example

Perturbation Strength (Epsilon):

0.1

Dive deeper into the concepts and techniques of adversarial machine learning.

Types of Adversarial Attacks

While this demo focuses on the FGSM attack, there are many other types of adversarial attacks:

Projected Gradient Descent (PGD): A more powerful iterative version of FGSM
Carlini & Wagner (C&W) Attack: An optimization-based attack that produces very effective adversarial examples
DeepFool: Finds the minimal perturbation needed to cross the decision boundary
Jacobian-based Saliency Map Attack (JSMA): Modifies only the most influential pixels

More Defense Strategies

Beyond the defenses demonstrated in this tool, researchers have developed several other approaches:

Defensive Distillation: Training a model to match the output of another model, making gradients harder to exploit
Randomized Smoothing: Adding random noise to inputs and averaging predictions to create certifiably robust classifiers
Feature Squeezing: Reducing the precision of inputs to remove adversarial perturbations
Gradient Masking/Obfuscation: Hiding gradients to make gradient-based attacks harder (though this can be bypassed)

Real-World Implications

Adversarial attacks have significant implications for AI security in real-world applications:

Autonomous Vehicles: Attackers could potentially place adversarial stickers on road signs to cause misclassification
Facial Recognition: Specially designed patterns on glasses or clothing could fool identity verification systems
Malware Detection: Adversarial techniques could help malware evade machine learning-based detection
Medical Diagnostics: Adversarial perturbations could cause misdiagnosis in AI-assisted medical imaging systems

Adversarial Attack & Defense Visualization

Understanding Adversarial Attacks and Defenses

What are Adversarial Attacks?

Fast Gradient Sign Method (FGSM)

Defense Strategies

Adversarial Training

Input Preprocessing

Ensemble Defense

Interactive Adversarial Attack Demo

Step 1: Select an Image

Step 2: Generate an Adversarial Example

Defense Comparison

Standard Model Low Robustness

Adversarially Trained Model High Robustness

Input Preprocessing Defense Medium Robustness

Ensemble Defense High Robustness

Learn More: Adversarial Attacks & Defenses

Types of Adversarial Attacks

More Defense Strategies

Real-World Implications

Further Reading

Help & Information

What are adversarial attacks?

How does FGSM work?

What is epsilon?

What are the defense strategies?

Can I test my own images?